Stochastic Gradient Descent

Only update weights by choosing a specific instance of the batch instead of all of them.
Very noisy
- but fast
- much faster than batch gradient for machine learning
samples have redundancy between them
Only reason for batching is because hardware is more efficient at batching
- Parallelized in a simple way, which is best solved by batching